Skip to content

CSHARP-5603: Add Big Endian support in BinaryVectorReader and BinaryVectorWriter #1682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

medhatiwari
Copy link
Contributor

@medhatiwari medhatiwari commented May 5, 2025

Description

This PR adds Big Endian support for System.Single (Float32) to the BinaryVectorWriter.WriteToBytes() method.

Background

While running the MongoDB.Bson.Tests test suite on a Big Endian (s390x) system, we encountered 34 consistent test failures within the BinaryVectorSerializerTests class.
Each failure was caused by a System.NotSupportedException indicating that binary vector data of float32 type is not yet supported on Big Endian architectures.

Exception Observed

System.NotSupportedException: Binary vector data is not supported on Big Endian architecture yet.

Sample Failing Tests

Some of the test cases that failed due to this limitation include:

BinaryVectorSerializerTests.BinaryVectorSerializer_should_deserialize_bson_vector<Float32>

BinaryVectorSerializerTests.BinaryVectorSerializer_should_serialize_bson_vector<Float32>

BinaryVectorSerializerTests.ArrayAsBinaryVectorSerializer_should_deserialize_bson_vector<Float32>

BinaryVectorSerializerTests.ArrayAsBinaryVectorSerializer_should_serialize_bson_vector<Float32>

BinaryVectorSerializerTests.MemoryAsBinaryVectorSerializer_should_serialize_bson_vector<Float32>

BinaryVectorSerializerTests.MemoryAsBinaryVectorSerializer_should_deserialize_bson_vector<Float32>

BinaryVectorSerializerTests.ReadOnlyMemoryAsBinaryVectorSerializer_should_serialize_bson_vector<Float32>

BinaryVectorSerializerTests.ReadOnlyMemoryAsBinaryVectorSerializer_should_deserialize_bson_vector<Float32>

Why This Fix Is Necessary

This limitation was blocking test pass status on Big Endian platforms such as s390x. Adding support for float32 serialization in Big Endian format:

Enables consistent behavior across architectures

Completes existing deserialization support added earlier in BinaryVectorReader.cs

Changes Introduced

Added Big Endian branch to BinaryVectorWriter.WriteToBytes() for T == float.

Used BinaryPrimitives.WriteSingleBigEndian() to write bytes in the correct order.

Left existing Little Endian logic untouched to preserve behavior.

cc: @giritrivedi

@medhatiwari medhatiwari requested a review from a team as a code owner May 5, 2025 10:53
@medhatiwari medhatiwari requested review from rstam and removed request for a team May 5, 2025 10:53
@BorisDog BorisDog requested review from BorisDog and removed request for rstam May 5, 2025 20:14
@medhatiwari medhatiwari force-pushed the binaryvectorsupport branch from 5cd9ca1 to a4384e3 Compare May 6, 2025 09:00
Signed-off-by: Medha Tiwari <[email protected]>
@medhatiwari medhatiwari force-pushed the binaryvectorsupport branch from a4384e3 to 2c2cae1 Compare May 6, 2025 09:01
@medhatiwari
Copy link
Contributor Author

Hi @BorisDog, if everything if fine, can this be merged?

@medhatiwari
Copy link
Contributor Author

Hi @BorisDog, just following up to check if there's any update on this PR. Please let me know if any further changes are needed.

@medhatiwari medhatiwari requested a review from BorisDog May 26, 2025 06:01
Copy link
Contributor

@BorisDog BorisDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review is pending on requested changes.

@medhatiwari medhatiwari requested a review from BorisDog May 28, 2025 12:52
Copy link
Contributor

@BorisDog BorisDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests fail on net472.

@medhatiwari medhatiwari force-pushed the binaryvectorsupport branch from ee0aa0a to 530ecda Compare May 29, 2025 13:16
@medhatiwari medhatiwari requested a review from BorisDog May 29, 2025 14:37
Copy link
Contributor

@BorisDog BorisDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Tests are passing as well.
Few styling comments + tests improvement.

public void ReadSingleLittleEndian_should_throw_on_insufficient_length()
{
var shortBuffer = new byte[3];
Assert.Throws<ArgumentOutOfRangeException>(() =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please switch to Record.Exception (some examples in BinaryVectorSerializerTests.cs)

{
return MemoryMarshal.Cast<T, byte>(span).ToArray();
}
int elementSize = Marshal.SizeOf<T>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use var where possible.

throw new NotSupportedException("Binary vector data is not supported on Big Endian architecture yet.");
}
case BinaryVectorDataType.Float32:
var length = vectorData.Length * sizeof(float);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can just have vectorData.Length * 4.
Float32 format is defined as 32 bits , in all other places 4 is hardcoded.

resultBytes[1] = padding;

var floatSpan = MemoryMarshal.Cast<TItem, float>(vectorData);
Span<byte> floatOutput = resultBytes.AsSpan(2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: var

@@ -35,15 +36,41 @@ public static byte[] WriteToBytes<TItem>(BinaryVector<TItem> binaryVector)
public static byte[] WriteToBytes<TItem>(ReadOnlySpan<TItem> vectorData, BinaryVectorDataType binaryVectorDataType, byte padding)
where TItem : struct
{
if (!BitConverter.IsLittleEndian)
byte[] resultBytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be defined in Float32 case.
Also can be simplified to result.

#endif
}

// This layout trick allows safely reinterpreting float as int and vice versa.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the comment.

@medhatiwari medhatiwari requested a review from BorisDog May 30, 2025 09:49
Copy link
Contributor

@BorisDog BorisDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more minor comments.

case BinaryVectorDataType.Float32:
byte[] result;
var length = vectorData.Length * 4;
result = new byte[2 + length];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var result = new byte[2 + length]; is sufficient.

Comment on lines 130 to 131
result = new float[count];
for (int i = 0; i < count; i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use 4 whitespaces instead of tab.

BinaryPrimitivesCompat.ReadSingleLittleEndian(shortBuffer));

exception.Should().BeOfType<ArgumentOutOfRangeException>();
exception.Message.Should().Contain("length");
Copy link
Contributor

@BorisDog BorisDog May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the following pattern:

var e = exception.Should().BeOfType<ArgumentOutOfRangeException>().Subject;
e.ParamName.Should().Be("length");

and in WriteSingleLittleEndian_should_throw_on_insufficient_length as well.

Also this seems to be the reason for ReadSingleLittleEndian_should_throw_on_insufficient_length and WriteSingleLittleEndian_should_throw_on_insufficient_length failers on net472.

#else
if (source.Length < 4)
{
throw new ArgumentOutOfRangeException(nameof(source), "Source span is too small to contain a float.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nameof(source.Length)?

@BorisDog BorisDog changed the title Add Big Endian Support for Float32 in BinaryVectorWriter.WriteToBytes<T>() CSHARP-5603: Add Big Endian support in BinaryVectorReader and BinaryVectorWriter May 30, 2025
@medhatiwari medhatiwari force-pushed the binaryvectorsupport branch from 4078bfa to c547a15 Compare May 30, 2025 19:00
@medhatiwari medhatiwari requested a review from BorisDog May 30, 2025 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants